1. Introduction

According to research, climate change has already brought multiple observable impacts to our environment. Glaciers have shrunk and a number of animal and plant species are in danger of extinction due to climate change. Such impacts can fundamentally transform whole ecosystems and the intricate webs of life. Furthermore, it has a significant effect on our livelihoods, health, and future.

There is no more time to wait. Although we cannot stop climate change overnight, we still can slow down the pace of it. And for this, we first must understand how the climate is changing and why it is happening.

Thus, in this research, we examine some representative scientific evidence of climate changing, and attempt to model and extrapolate the global mean temperature using various prediction models.

Reference: Modeling global temperature changes with genetic programming

2. Gather the Data

The data is stored in data directory, containing 7 files as mentioned below:

3. Data preprocessing

3.1 Import Libraries

3.2 Read data

As the data is monthly, we transform all timestamps into the beginning (first date) of the month.

Read temperature data

Read AMO data

Read CO2 data

We use smoothened seasonally adjusted CO2 values:

Read NAO data

Read Sun Spot Data

Read Volcanic data

Merge Data

Correlation matrix

2. FBprophet

Reference: https://facebook.github.io/prophet/docs/quick_start.html

Univariate model

Cross validation:

Multivariate model with extra regressors and holidays

Hyperparameter tuning

Final model

3. Random Forest

Random Search Cross Validation

Hyperparameters for the Random Forest model can be found with Scikit-Learn's Random Search Cross Validation. It performs K-fold CV in such sets of parameter that randomly sample from a predefined parameter grid, and return the best one.

Grid Search Cross Validation

Using Random Search enables narrowing down the focus on each parameter. Now we can start to search "around" with Scikit-Learn's Grid Search which validate all combinations from a parameter grid:

Evaluation

3.3 SARIMAX

It's important to evaluate the stationarity of a time series. SARIMAX works fine with time series that are significantly dependent on seasonality and trend. If the time series is not like that, ARIMA would be a better choice.

Check if the time series is stationary or not by Augmented Dickey-Fuller test:

We see that p-value > 0.05 so the time series is not stationary. Let's difference it:

Here we see similar patterns in both ACF and PACF, which gives no strong evidence to determine AR and MA terms. Two spikes are in both plot indicating that AR(2) and MA(2) might be good candidates. Also, both ACF and PACF somehow tail off to 0 so ARIMA(1,1,1) might also be a good fit.

However, we will use SARIMAX for the original time series and perform a grid search to exhaustively find the best model to guarantee the best result.

3.4 XGBoost